KolmogorovSmirnov Test-Based Actively-Adaptive Thompson Sampling for Non-Stationary Bandits
نویسندگان
چکیده
منابع مشابه
Generalized Thompson Sampling for Contextual Bandits
Thompson Sampling, one of the oldest heuristics for solving multi-armed bandits, has recently been shown to demonstrate state-of-the-art performance. The empirical success has led to great interests in theoretical understanding of this heuristic. In this paper, we approach this problem in a way very different from existing efforts. In particular, motivated by the connection between Thompson Sam...
متن کاملThompson Sampling for Budgeted Multi-Armed Bandits
Thompson sampling is one of the earliest randomized algorithms for multi-armed bandits (MAB). In this paper, we extend the Thompson sampling to Budgeted MAB, where there is random cost for pulling an arm and the total cost is constrained by a budget. We start with the case of Bernoulli bandits, in which the random rewards (costs) of an arm are independently sampled from a Bernoulli distribution...
متن کاملDouble Thompson Sampling for Dueling Bandits
In this paper, we propose a Double Thompson Sampling (D-TS) algorithm for dueling bandit problems. As its name suggests, D-TS selects both the first and the second candidates according to Thompson Sampling. Specifically, D-TS maintains a posterior distribution for the preference matrix, and chooses the pair of arms for comparison according to two sets of samples independently drawn from the pos...
متن کاملThompson Sampling for Combinatorial Semi-Bandits
We study the application of the Thompson Sampling (TS) methodology to the stochastic combinatorial multi-armed bandit (CMAB) framework. We analyze the standard TS algorithm for the general CMAB, and obtain the first distributiondependent regret bound of O(m log T/∆min) for TS under general CMAB, where m is the number of arms, T is the time horizon, and ∆min is the minimum gap between the expect...
متن کاملAnalysis of Thompson Sampling for Stochastic Sleeping Bandits
We study a variant of the stochastic multiarmed bandit problem where the set of available arms varies arbitrarily with time (also known as the sleeping bandit problem). We focus on the Thompson Sampling algorithm and consider a regret notion defined with respect to the best available arm. Our main result is anO(log T ) regret bound for Thompson Sampling, which generalizes a similar bound known ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE transactions on artificial intelligence
سال: 2021
ISSN: ['2691-4581']
DOI: https://doi.org/10.1109/tai.2021.3121653